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Abstract. 


This  report  describes  n  prototype  3  D  objec ‘  recognition 
system  composed  of  three  major  sections;  1)  an  object  rep¬ 
resentation  module,  2)  a  feature  extraction  and  matching 
module,  and  3)  a  recognition  control  strategy  module.  The 
object  representation  module  uses  an  implementation  of  a 
newly  developed  algorithm  for  the  construction  of  perspec¬ 
tive  projection  aspect  graphs  of  convex  polyhedra.  The 
feature  extraction  and  matching  module  implements  a  new 
method  of  using  Fourier  Descriptors  to  characterize  the 
complete  2-D  projection  of  an  object.  The  recognition  con¬ 
trol  strategy  module  uses  the  aspect  graph  object  repre¬ 
sentation  to  control  the  application  of  a  constrained  opti¬ 
mization  algorithm.  The  system  is  implemented  in  C  on 
a  SUN  workstation,  and  some  simple  recognition  experi¬ 
ments  have  been  carried  out  to  demonstrate  the  validity  of 
the  overall  concept.  The  limitations  of  the  system  suggest 
several  important  avenues  for  future  research. 

K €<  \  U-  0  r  <  >  s  ■.  CC  da  f 

7 i ufcr'r.U »  ac  tvjir  ,■  f  T  (  <  rV; .  \ 

1.  Introduction  '  j. 

The  long-term  goal  of  our  research  group  (dubbed  ’’ER¬ 
RORS"  for  ’’Environment  for  Related  Research  in  Object 
Recognition  Systems”)  is  to  construct  a  3-D  object  recog¬ 
nition  system  with  a  useful  level  of  practical  competence. 
For  our  current  work,  3-D  object  recognition  means  recog¬ 
nizing  the  identity  of  the  object  and  having  some  estimate 
of  its  parameters  of  location  and  orientation.  The  design 
of  a  3-D  object  recognition  system  must  generally  take  into 
account  three  interrelated  concerns;  1)  object  representa¬ 
tion,  2)  feature  extraction  and  matching,  and  3)  recogni¬ 
tion  control  strategy.  Accordingly,  our  current  system  is 
structured  into  three  major  modules,  reflecting  the  com¬ 
bined  work  of  several  people.  The  basic  results  underlying 
each  of  these  modules  arc  outlined  in  sections  2,  3  and  4. 
Section  5  describes  the  results  of  some  experiments  which 
demonstrate  the  basic  capabilities  of  the  system.  Section 
6  outlines  the  limitations  of  the  current  system,  and  sug¬ 
gests  areas  of  further  research  to  produce  a  new  system  of 
greater  competence. 

’This  work  is  supported  by  Air  Force  Oflice  of  Scientific  Research 
grant  AFOSR-S7-031G. 


2.  Object  Representation  Using  Aspect  Graphs 

Traditional  approaches  to  3-D  object  recognition  can  be 
characterized  as  either  view-dependent  or  view-independent, 
according  to  whether  the  object  representation  tisrd  dur¬ 
ing  recognition  is  some  set  of  standard  2-D  views  or  a  true 
3-D  model,  respectively  [1,  4].  The  aspect  graph  concept, 
originally  introduced  by  Koenderink  and  van  Doom  [I2j 
as  a  possible  mechanism  involved  in  human  vision,  can  be 
thought  of  as  a  hybrid  representation  which  combines  a 
true  3-D  model  with  an  enumeration  of  all  the  fundamen¬ 
tally'  different  views  of  the  object.  The  potential  power  of 
the  aspect  graph  concept  has  been  cpiickly  and  widely  rec¬ 
ognized  in  computer  vision,  and  a  number  of  researchers 
have  recently  described  algorithms  for  creating  representa¬ 
tions  related  to  the  aspect  graph. 

One  major  distinction  between  the  different  represen¬ 
tations  which  have  been  proposed  is  whether  or  not  they 
depend  on  the  assumption  of  orthographic  projection.  If 
f  orthographic  projection  is  assumed,  then  the  ’’cell”  of  view¬ 
ing  space  associated  with  a  node  of  the  aspect  graph  corre¬ 
sponds  to  a  surface  area  on  the  Gaussian  sphere  [2,  3,  7-9, 
11,  13.  15,  17).  The  important  distinction  with  regard  to 
our  work  is  that  this  type  of  orthographic  projection  aspect 
graph  only  captures  changes  in  the  aspect  which  are  due 
to  changes  in  viewing  orientation.  Thus,  it  cannot  be  used 
for  recognition  which  includes  estimating  the  parameters  of 
translation. 

Relatively  less  work  has  been  done  on  algorithms  for 
constructing  the  pcnpcclivc  projection  aspect  graph.  The 
first  algorithm  for  constructing  the  perspective  projection 
aspect  graph  of  3-D  objects  was  developed  by  Stewman 
and  Bowyer  [21],  and  is  applicable  only  to  convex  poly¬ 
hedra  with  trihedral  vertices.  This  algorithm  was  subse¬ 
quently  extended  to  handle  general  convex  polyhedra  [22]. 
Complete  details  of  the  algorithm  and  an  analysis  of  the 
time  and  space  complexity  appear  in  [23].  We  have  since 
learned  of  related  work  by  two  other  researchers;  Watts 
[26]  has  described  an  algorithm  for  constructing  the  aspect 
graph  of  convex  polyhedra  using  a  sweep-plane  paradigm, 
and  Edelsbrunncr  [5]  has  described  an  algorithm  for  con¬ 
structing  the  geometric  incidence,  lattice  representing  the 
arrangement  of  planes  in  space.  The  geometric  incidence 
lattice  is  a  different  and  more  abstract  entity  than  the  as- 
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poet  graph,  but  the  algorithm  could  easily  be  extended  to 
create  the  aspect  graph. 

Our  algorithm  to  create  the  aspect  graph  begins  by 
finding  all  the  lines  and  points  of  intersection  between  the 
planes  in  which  the  faces  of  the  object  lie.  The  algorithm 
then  uses  all  these  points  of  intersection,  plus  additional 
points  on  the  infinite  extensions  of  each  line,  to  isolate 
groups  of  points  on  the  boundary  of  individual  3-D  ’’view¬ 
ing  cell”  volumes.  Finally,  the  algorithm  constructs  an  ex¬ 
plicit  aspect  graph  structure  and  writes  the  resulting  repre¬ 
sentation  to  disk.  An  example  aspect  graph  for  one  of  the 
objects  used  in  our  experiments  is  shown  in  Figure  1.  The 
user  interface  to  the  implemented  algorithm  is  described 
in  (24).  The  important  elements  of  the  representation  for 
our  purposes  here  are  that  each  node  of  the  aspect  graph 
is  attributed  with  1)  a  definition  of  the  corresponding  3-D 
cell  of  viewing  space,  2)  a  definition  of  which  faces  are  vis¬ 
ible  from  viewpoints  in  that  cell,  and  3)  the  coordinates  of 
a  ’’central  viewpoint”  in  the  cell.  In  addition,  the  aspect 
graph  is  arranged  by  levels,  where  each  node  in  a  given 
level  has  the  same  number  of  visible  faces. 

3.  Feature  Matching  Using  Fourier  Descriptors 

Fourier  Descriptors  (FDs)  have  been  used  for  some  time 
in  both  2-D  and  3-D  object  recognition.  Several  researchers 
have  applied  the  technique  for  3-D  object  recognition  by  us¬ 
ing  the  FDs  of  the  boundary  outline  of  the  2-D  projection 
of  the  3-D  object  [1G,  25],  While  this  technique  has  had 
some  success,  there  are  clearly  some  objects  which  could 
not  be  distinguished  by  only  the  boundary  outlines  of  their 
2-D  projections.  Eggert  and  Bowyer  [G]  have  developed  a 
method  of  using  FDs  to  describe  the  shape  of  the  inter¬ 
nal  detail  of  the  2-D  projection  of  an  object  its  well  as  its 
boundary  outline.  The  feature  extraction  and  matching 
module  of  our  current  system  uses  this  method. 

The  feature  matching  module  undergoes  an  initializa¬ 
tion  step  which  involves  processing  the  original  image.  The 
module  extracts  the  line  drawing  of  the  object  in  the  im¬ 
age,  selects  a  unique  subset  of  all  the  possible  circuits  in 
the  line  drawing,  and  calculates  the  FDs  for  each  circuit  in 
this  unique  subset.  For  convex  polyhedra,  each  circuit  in 
the  chosen  subset  corresponds  to  a  face  of  the  object.  Thus 
this  feature  extraction  and  matching  strategy  seems  partic¬ 
ularly  appropriate  to  the  class  of  objects  for  which  we  are 
currently’  able  to  create  the  aspect  graph  representation. 

On  subsequent  invocations  of  the  feature  extraction  and 
matching  module,  the  parameters  of  translation  and  orien¬ 
tation  of  an  object  model  are  given  as  input.  The  module 
calculates  a  perspective  projection  line  drawing  of  the  ob¬ 
ject  model  from  the  visible  faces  attributed  to  the  aspect 
graph  node,  extracts  the  unique  set  of  circuits  from  the  line 
drawing,  calculates  the  FDs  for  each  circuit  in  the  unique 
set,  pairs  up  the  circuits  from  this  line  drawing  with  those 
from  the  original  image,  and  determines  the  best  match  for 
the  set  of  circuits.  The  algorithm  reports  a  figure  of  merit 
for  the  match,  along  with  the  (2-D)  rotation  and  scale  used 


to  obtain  the  best  match.  The  figure  of  merit  for  the  match 
is  the  sum  of  the  squares  of  the  differences  of  the  FDs  for 
each  of  the  circuit  pairs. 

4.  Recognition  Using  Nonlinear  Optimization 

Several  previous  researchers  have  applied  nonlinear  op¬ 
timization  as  a  control  strategy  for  3-D  object  recognition 
(10,  14,  25).  The  fundamental  problems  encountered  are  1) 
how  to  choose  starting  parameter  estimates,  and  2)  how  to 
know  when  the  global  minimum  has  been  found.  Stark  and 
Bowyer  (18)  suggested  a  method  of  using  the  aspect  graph 
representation  to  control  the  application  of  nonlinear  opti¬ 
mization  in  a  way  that  avoids  these  problems.  The  basic 
idea  behind  this  method  is  fairly  simple.  Assume  that  we 
have  a  database  in  which  objects  are  represented  by  per¬ 
spective  projection  aspect  graphs,  and  are  given  an  image 
in  which  an  unknown  object  appears  at  an  unknown  ori¬ 
entation  and  translation.  Since  each  node  corresponds  to 
a  different  aspect  of  an  object  and  the  cell  of  space  from 
which  that  aspect  is  seen,  we  can  generate  a  separate  op¬ 
timization  solution  for  each  node  in  the  database.  Thus, 
for  each  node,  we  get  the  minimum  found  within  the  corre¬ 
sponding  viewing  cell.  We  then  select  the  minimum  across 
all  nodes  as  the  recognized  view  of  the  unknown  object  in 
the  image. 

Stark,  Eggert  and  Bowyer  describe  the  specific  imple¬ 
mentation  of  this  general  approach  used  in  our  current  sys¬ 
tem  [19],  Since  the  aspect  graphs  arc  arranged  by  levels 
corresponding  to  the  number  of  visible  faces,  only  nodes 
whose  number  of  visible  faces  matches  the  number  of  cir¬ 
cuits  found  in  the  original  image  are  considered  as  candi¬ 
dates  by  the  recognition  process.  For  each  candidate  node, 
the  optimization  algorithm  proceeds  as  follows.  First,  ini¬ 
tial  estimates  of  the  parameters  of  translation  [X,  Y,  Z]  and 
rotation  [7?,,  Tty,  7?,)  are  constructed.  The  feature  match¬ 
ing  module  is  invoked  to  create  a  projected  line  drawing, 
extract  and  match  features,  and  return  the  figure  of  merit 
for  the  match  as  the  value  of  the  objective  function,  as  well 
as  the  rotation  and  scale  used  in  the  2-D  match.  The  2-D 
rotation  and  scale  values  are  used  immediately  to  update 
the  Z  and  7?,  viewpoint  parameter  estimates.  Then  finite 
differences  arc  calculated  for  X,  Y,  7?r,  and  7?v,  and  step 
values  in  these  four  parameters  arc  derived  using  damped 
least  squares.  This  results  in  a  complete  set  of  parameters 
for  a  new  viewpoint. 

If  the  new  viewpoint  is  within  the  viewing  cell,  and  the 
objective  function  decreases  at  the  new  viewpoint,  then 
the  Z  and  7?,  viewpoint  parameter  estimates  arc  updated 
according  to  the  new  2-D  rotation  and  scale  values  returned 
from  the  last  match,  new  finite  differences  are  calculated, 
and  new  step  values  determined  for  X,  Y,  7?r,  and  7?„.  If  the 
value  of  the  objective  function  goes  below  a  zero  tolerance, 
then  the  process  terminates.  If  the  new  viewpoint  would  he 
outside  the  bounds  of  the  cell,  or  if  the  objective  function 
would  increase  at  the  new  viewpoint,  then  the  damping 
factor  is  increased  and  new  stop  values  for  X,  Y,  7?r,  and 
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/?,,  are  calculated. 


5.  Performance  of  the  Current  System 

In  order  to  assess  the  effectiveness  of  this  approach  to 
3-D  object  recognition,  several  simulated  recognition  exper¬ 
iments  were  carried  out  [19].  One  of  experiments  attempts 
to  assess  how  well  the  approach  does  at  recognizing  the  cor¬ 
rect  object  by  selecting  the  lowest  objective  function  value 
across  a  set  of  candidate  aspects.  For  this  experiment,  we 
randomly  generated  25  simulated  viewpoints  for  each  of  5 
different  three-face  aspects  taken  from  4  different  object 
models  (see  Figures  1  and  2).  For  each  of  the  simulated 
viewpoints,  we  started  up  the  optimization  process  for  each 
of  the  5  aspects.  Thus,  for  each  simulated  viewpoint,  we 
have  one  optimization  result  found  in  the  correct  viewing 
cell  and  4  results  found  in  incorrect  viewing  cells.  The 
results  are  summarized  in  Table  1.  Three  of  the  aspects 
matched  correctly  in  all  25  trials.  One  aspect  matched 
correctly  in  24  out  of  25  trials;  one  three-face  view  of  the 
cube  was  mistaken  for  a  three-face  view  of  the  rectangular 
block.  The  remaining  aspect  matched  correctly  in  23  out 
of  25  trials;  twice  a  three-face  view  of  the  rectangular  block 
was  mistaken  for  a  three-face  view  of  the  truncated  wedge. 
The  two  incorrect  matches  for  the  truncated  wedge  occur 
when  the  outline  of  the  third  face,  which  is  the  only  signif¬ 
icant  difference  between  the  two  aspects,  has  collapsed  to 
nearly  a  single  line.  Thus,  these  would  always  be  difficult 
viewpoints  to  handle  correctly.  The  other  incorrect  match 
appears  to  bo  an  anomaly  where  the  method  simply  results 
in  an  incorrect  choice  for  recognition. 

0.  Summary  and  Suggestions  for  Future  Work 

We  have  developed  an  algorithm  for  the  construction  of 
perspective  projection  aspect  graphs  of  convex  polyhcdra, 
developed  a  new  method  of  using  Fourier  Descriptors  to 
characterize  the  complete  line  drawing  of  such  objects,  and 
formulated  a  methodology  which  uses  the  aspect  graph  to 
apply  optimization  techniques  for  recognition.  Our  current 
prototype  system  pulls  these  results  together  to  provide  a 
demonstration  of  the  potential  of  aspect  graph  based  3- 
D  object  recognition.  Much  more  work  must  be  done  in 
several  areas  before  a  system  with  a  truly  interesting  level 
of  practical  competence  can  be  constructed. 

One  major  line  for  continued  research  is  to  be  able  to 
construct  the  aspect  graph  for  a  larger  class  of  objects.  We 
are  currently  working  on  an  algorithm  for  nonconvex  poly¬ 
hcdra.  After  that,  we  plan  to  investigate  an  algorithm  for 
objects  defined  as  a  CSG  construction  of  spheres,  cones, 
cylinders,  and  blocks.  Ideally,  this  algorithm  will  be  able 
to  take  the  output  of  existing  solid  modeling  systems  as  its 
input.  Because  the  worst-case  size  of  an  aspect  giaph  is 
so  large,  0(N3)  for  convex  polyhcdra,  we  are  also  investi¬ 
gating  concepts  of  node  equivalence  which  may  be  used  to 
reduce  the  effective  complexity  [20].  We  are  also  interested 
in  developing  strategics  to  choose  a  succession  of  views  to 


recognize  an  object  which  cannot  lie  distinguished  from  the 
information  in  an  initial  view.  Lastly,  we  plan  to  investigate 
using  different  types  of  features,  such  ns  "non-acc  idental 
properties,"  to  create  a  more  robust  feature  matching  strat¬ 
egy- 
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